Binary Neural Networks Algorithms, Architectures, and Applications (Baochang Zhang, Sheng Xu, Mingbao Lin etc.)

MCN: Modulated Convolutional Network

Algorithm 1 MCN training. L is the loss function, Q is the reconstructed ﬁlter, λ1 and λ2

are decay factors, and N is the number of layers. Update() updates the parameters based

on our update scheme.

Input: a minibatch of inputs and their labels, unbinarized ﬁlters C, modulation ﬁlters M,

learning rates η1 and η2, corresponding to C and M, respectively.

Output: updated unbinarized ﬁlters C^t⁺¹, updated modulation ﬁlters M ^t⁺¹, and updated

learning rates η^t⁺¹

and η^t⁺¹

1: {1. Computing gradients with aspect to the parameters:}

2: {1.1. Forward propagation:}

3: for k =1 to N do

ˆC ←Binarize(C)

Computing Q via Eq. 3.13 ∼3.14

Convolutional features calculation using Eq. 3.15 ∼3.17

7: end for

8: {1.2. Backward propagation:}

9: {Note that the gradients are not binary.}

10: Computing δQ = ^∂L

∂Q

11: for k =N to 1 do

12:

Computing δ ˆ

C ^{using Eq. 3.20, Eq. 3.22}^∼^3.23

13:

Computing δM using Eq. 3.24, Eq. 3.26 ∼3.27

14: end for

15: {Accumulating the parameters gradients:}

16: for k = 1 to N do

17:

C^t⁺¹←Update(δ ˆ

C^,^η¹^{) (using Eq. 3.21)}

18:

M ^t⁺¹←Update(δM, η2) (using Eq. 3.25)

19:

η^t⁺¹

←λ1η1

20:

η^t⁺¹

←λ2η2

21: end for

where η2 is the learning rate. Furthermore, we have the following.

∂LS

∂M ⁼^∂L^S

∂Q ^·^∂Q

∂M ⁼

i,j

∂LS

∂Qij

· ^ˆCi,

(3.26)

Based on Eq. 3.18 and we have:

∂LM

∂M ⁼⁻^θ

i,j

(Ci −^ˆCi ◦Mj) · ^ˆCi.

(3.27)

Details about the derivatives concerning center loss can be found in [245]. These deriva-

tions show that MCNs can be learned with the BP algorithm. The quantization process leads

to a new loss function via a simple projection function, which never aﬀects the convergence

of MCNs. We describe our algorithm in Algorithm 1.

3.4.4

Parameters Evaluation

θ and λ: There are θ and λ in Eq. 3.18, which are related to the ﬁlter loss and center loss.

The eﬀect of parameters θ and λ is evaluated in CIFAR-10 for a 20-layer MCN with width

16-16-32-64, the architecture detail of which can be found in [281] and is also shown in

Fig. 3.6. The Adadelta optimization algorithm [282] is used during the training process,